Letter Frequencies in the English Language |
Relative frequencies of letters
By letter |
By frequency |
||
Letter |
Frequency |
Letter |
Frequency |
a |
0.08167 |
e |
0.12702 |
b |
0.01492 |
t |
0.09056 |
c |
0.02782 |
a |
0.08167 |
d |
0.04253 |
o |
0.07507 |
e |
0.12702 |
i |
0.06966 |
f |
0.02228 |
n |
0.06749 |
g |
0.02015 |
s |
0.06327 |
h |
0.06094 |
h |
0.06094 |
i |
0.06966 |
r |
0.05987 |
j |
0.00153 |
d |
0.04253 |
k |
0.00772 |
l |
0.04025 |
l |
0.04025 |
c |
0.02782 |
m |
0.02406 |
u |
0.02758 |
n |
0.06749 |
m |
0.02406 |
o |
0.07507 |
w |
0.02360 |
p |
0.01929 |
f |
0.02228 |
q |
0.00095 |
g |
0.02015 |
r |
0.05987 |
y |
0.01974 |
s |
0.06327 |
p |
0.01929 |
t |
0.09056 |
b |
0.01492 |
u |
0.02758 |
v |
0.00978 |
v |
0.00978 |
k |
0.00772 |
w |
0.02360 |
j |
0.00153 |
x |
0.00150 |
x |
0.00150 |
y |
0.01974 |
q |
0.00095 |
z |
0.00074 |
z |
0.00074 |
Top 10 beginning of word letters
Letter |
Frequency |
t |
0.1594 |
a |
0.155 |
i |
0.0823 |
s |
0.0775 |
o |
0.0712 |
c |
0.0597 |
m |
0.0426 |
f |
0.0408 |
p |
0.040 |
w |
0.0382 |
Letter |
Frequency |
e |
0.1917 |
s |
0.1435 |
d |
0.0923 |
t |
0.0864 |
n |
0.0786 |
y |
0.0730 |
r |
0.0693 |
o |
0.0467 |
l |
0.0456 |
f |
0.0408 |
Most common bigrams (in order)
th, he, in, en, nt, re, er, an, ti, es, on, at, se, nd, or, ar, al, te, co, de, to, ra, et, ed, it, sa, em, ro.
Most common trigrams (in order)
the, and, tha, ent, ing, ion, tio, for, nde, has, nce, edt, tis, oft, sth, men
Results from Project Gutenberg
Analysis of 9,481 English works (3.98 GiB) from Project Gutenberg (the extracted contents of the 2003 PG DVD, plain text files only, minus the human genome project, non-English works, and duplicates in 7-bit-clean encoding), after stripping off the common boilerplate text present in every file so as not to skew results, yielded the following frequencies of letters, bigrams, trigrams, and quadrigrams:
Letters
Of 3,104,375,038 letters scanned:
1. e (390395169, 12.575645%)
2. t (282039486, 9.085226%)
3. a (248362256, 8.000395%)
4. o (235661502, 7.591270%)
5. i (214822972, 6.920007%)
6. n (214319386, 6.903785%)
7. s (196844692, 6.340880%)
8. h (193607737, 6.236609%)
9. r (184990759, 5.959034%)
10. d (134044565, 4.317924%)
11. l (125951672, 4.057231%)
12. u (88219598, 2.841783%)
13. c (79962026, 2.575785%)
14. m (79502870, 2.560994%)
15. f (72967175, 2.350463%)
16. w (69069021, 2.224893%)
17. g (61549736, 1.982677%)
18. y (59010696, 1.900888%)
19. p (55746578, 1.795742%)
20. b (47673928, 1.535701%)
21. v (30476191, 0.981717%)
22. k (22969448, 0.739906%)
23. x (5574077, 0.179556%)
24. j (4507165, 0.145188%)
25. q (3649838, 0.117571%)
26. z (2456495, 0.079130%)
Bigrams
Of 2,383,373,483 bigrams scanned:
1. th (92535489, 3.882543%)
2. he (87741289, 3.681391%)
3. in (54433847, 2.283899%)
4. er (51910883, 2.178042%)
5. an (51015163, 2.140460%)
6. re (41694599, 1.749394%)
7. nd (37466077, 1.571977%)
8. on (33802063, 1.418244%)
9. en (32967758, 1.383239%)
10. at (31830493, 1.335523%)
11. ou (30637892, 1.285484%)
12. ed (30406590, 1.275779%)
13. ha (30381856, 1.274742%)
14. to (27877259, 1.169655%)
15. or (27434858, 1.151094%)
16. it (27048699, 1.134891%)
17. is (26452510, 1.109877%)
18. hi (26033632, 1.092302%)
19. es (26033602, 1.092301%)
20. ng (25106109, 1.053385%)
Trigrams
Of 1,699,542,842 trigrams scanned:
1. the (59623899, 3.508232%)
2. and (27088636, 1.593878%)
3. ing (19494469, 1.147042%)
4. her (13977786, 0.822444%)
5. hat (11059185, 0.650715%)
6. his (10141992, 0.596748%)
7. tha (10088372, 0.593593%)
8. ere (9527535, 0.560594%)
9. for (9438784, 0.555372%)
10. ent (9020688, 0.530771%)
11. ion (8607405, 0.506454%)
12. ter (7836576, 0.461099%)
13. was (7826182, 0.460487%)
14. you (7430619, 0.437213%)
15. ith (7329285, 0.431250%)
16. ver (7320472, 0.430732%)
17. all (7184955, 0.422758%)
18. wit (6752112, 0.397290%)
19. thi (6709729, 0.394796%)
20. tio (6425262, 0.378058%)
Quadrigrams
Of 1,144,085,293 quadrigrams scanned:
1. that (8709261, 0.761242%)
2. ther (6916008, 0.604501%)
3. with (6565513, 0.573866%)
4. tion (6314428, 0.551919%)
5. here (4285164, 0.374549%)
6. ould (4232202, 0.369920%)
7. ight (3540253, 0.309440%)
8. have (3324067, 0.290544%)
9. hich (3252540, 0.284292%)
10. whic (3247213, 0.283826%)
11. this (3161481, 0.276333%)
12. thin (3093756, 0.270413%)
13. they (3002324, 0.262421%)
14. atio (3001919, 0.262386%)
15. ever (2982572, 0.260695%)
16. from (2958372, 0.258580%)
17. ough (2899649, 0.253447%)
18. were (2643859, 0.231089%)
19. hing (2630750, 0.229944%)
20. ment (2555284, 0.223347%)